This is an analysis of amaranth (Amaranthus sp.) size data measured by Karen Adams and Abby Dockter. The data consist of diameter and thickness measurements of archaeologically-recovered amaranth seeds from Dyck Cliff Dwelling in the Verde Valley, Arizona, as well as several contemporary accessions from Native Seeds/SEARCH.
The primary research questions addressed here are:
This document is an R Markdown workbook—the document contains all statistical tests reported and interpreted here as well as the code that produced them. Select “Show All Code” from the drop-down menu on the upper-right side of this page to view the code, or click the “Code” button wherever it appears. Please contact Kyle Bocinsky with any questions.
am_data <- readxl::read_excel("./data/Amaranthus size project data_Dec_20_2017.xlsx",
sheet = 1) %>%
dplyr::mutate(`Species` = ifelse(is.na(`Species`),"Unknown",`Species`)) %>%
dplyr::mutate_at(.vars = vars(`Accession ID`:Color),
.funs = ~factor(., levels = unique(.)))
am_ancient <- am_data %>%
dplyr::filter(Age == "Ancient")
am_modern <- am_data %>%
dplyr::filter(Age == "Modern")
am_modern_groups <-
am_modern %>%
group_by(`Accession ID`) %>%
count()
The data consist of 360 samples: 60 archaeological seeds recovered from a cotton twined cloth bag found in the Dyck Cliff Dwelling in the Verde Valley of Arizona, and 30 seeds each from 10 contemporary accessions held by Native Seeds/SEARCH. For each sample, place of origin, color, diameter, and thickness were recorded. Additionally, species is known for the contemporary seed.
am_modern %>%
group_by(`Accession ID`,
`Accession`,
Origin,
Species,
Color) %>%
summarize(Count = n(),
`Average Diameter (mm)` = mean(`Diameter (mm)`) %>% round(digits = 3),
`SD Diameter (mm)` = sd(`Diameter (mm)`) %>% round(digits = 3),
`Average Thickness (mm)` = mean(`Thickness (mm)`) %>% round(digits = 3),
`SD Thickness (mm)` = sd(`Thickness (mm)`) %>% round(digits = 3)) %>%
dplyr::arrange(Accession,
Origin,
Species,
Color) %>%
knitr::kable(format = "html",
col.names = c("ID",
"Accession",
"Origin",
"Species",
"Color",
"Count",
"Mean",
"SD",
"Mean",
"SD")) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F) %>%
kableExtra::add_header_above(c(" " = 6, "Diameter (mm)" = 2, "Thickness (mm)" = 2)) %>%
kableExtra::column_spec(4, italic = T)
| ID | Accession | Origin | Species | Color | Count | Mean | SD | Mean | SD |
|---|---|---|---|---|---|---|---|---|---|
| F | C01-005 New Mexico (C006) | Rinconada, New Mexico | A. hypochondriacus | White | 30 | 1.313 | 0.073 | 0.829 | 0.053 |
| K | C01-005 New Mexico (C006) | Rinconada, New Mexico | A. hypochondriacus | White | 30 | 1.376 | 0.058 | 0.837 | 0.047 |
| O | C01-006 Rio San Lorenzo (C007) | Durango, Mexico | A. hypochondriacus | White | 30 | 1.126 | 0.066 | 0.623 | 0.080 |
| A | C02-007 Hopi Red Dye (C002) | Hopi, Arizona | A. cruentus | Black | 30 | 1.253 | 0.066 | 0.723 | 0.054 |
| M | C02-009 Mountain Pima Greens (C004) | Sonora, Mexico | A. cruentus | Black | 30 | 1.285 | 0.080 | 0.732 | 0.053 |
| E | C02-013 Alegria (C008) | Morelos, Mexico | A. cruentus | White | 30 | 1.368 | 0.065 | 0.823 | 0.075 |
| J | C02-013 Alegria (C008) | Morelos, Mexico | A. cruentus | Black | 30 | 1.276 | 0.087 | 0.721 | 0.055 |
| Q | C02-014 Marbled (C016) | Morelos, Mexico | A. cruentus | White | 30 | 1.347 | 0.101 | 0.815 | 0.063 |
| U | C02-014 Marbled (C016) | Morelos, Mexico | A. cruentus | Black | 30 | 1.357 | 0.082 | 0.771 | 0.045 |
| G | C02-019 Paiute (C009) | Kaibab Southern Paiute Reservation, Utah | A. cruentus | Black | 30 | 1.288 | 0.050 | 0.756 | 0.029 |
The modern samples from Native Seeds/SEARCH were all analyzed by Abby Dockter under 32x magnification using a Zeiss binocular microscope using an eyepiece micrometer. Several of the group IDs come from the same Native Seeds/SEARCH accession.
am_ancient %>%
group_by(Analyst,
Color) %>%
summarize(Count = n(),
`Average Diameter (mm)` = mean(`Diameter (mm)`) %>% round(digits = 3),
`SD Diameter (mm)` = sd(`Diameter (mm)`) %>% round(digits = 3),
`Average Thickness (mm)` = mean(`Thickness (mm)`) %>% round(digits = 3),
`SD Thickness (mm)` = sd(`Thickness (mm)`) %>% round(digits = 3)) %>%
knitr::kable(format = "html",
col.names = c("Analyst",
"Color",
"Count",
"Mean",
"SD",
"Mean",
"SD")) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F) %>%
kableExtra::add_header_above(c(" " = 3, "Diameter (mm)" = 2, "Thickness (mm)" = 2))
| Analyst | Color | Count | Mean | SD | Mean | SD |
|---|---|---|---|---|---|---|
| ARD | Black | 30 | 1.317 | 0.103 | 0.709 | 0.104 |
| KRA | Black | 30 | 1.172 | 0.064 | 0.727 | 0.068 |
Half of the ancient samples were analyzed by Karen Adams (“KRA”) under 20x magnification, and half by Abby Dockter (“ARD”) under 32x magnification using a Zeiss binocular microscope using an eyepiece micrometer. There are visible (see below) and statistically significant differences between the two analysts, primarily in the diameter measurements (~0.2 mm). Therefor, both because of the higher precision in measurements and the analyst consistency with the rest of the analyses, we are going to drop observations by Adams, and only use those from Dockter.
out <- am_ancient %>%
ggplot(mapping = aes(x = `Diameter (mm)`,
y = `Thickness (mm)`,
color = Analyst)) +
geom_smooth(method=lm) +
geom_point()
out %>%
plotly::ggplotly()
Welch two sample t-tests on each dimension demonstrate what we can see visually. There are significant differences among the two analysts along the diameter measurements, but not along the thickness measurements.
bind_rows(
am_ancient %>%
t.test(`Diameter (mm)`~Analyst, data = .) %>%
broom::tidy(),
am_ancient %>%
t.test(`Thickness (mm)`~Analyst, data = .) %>%
broom::tidy()
) %>%
dplyr::mutate(Variable = c("Diameter (mm)","Thickness (mm)")) %>%
dplyr::select(Variable,
statistic,
p.value) %>%
dplyr::rename(`T statistic` = statistic,
`p-value` = p.value) %>%
knitr::kable() %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F)
## Currently generic markdown table using pandoc is not supported.
| Variable | T statistic | p-value |
|---|---|---|
| Diameter (mm) | 6.5946085 | 0.0000000 |
| Thickness (mm) | -0.7857747 | 0.4357106 |
am_data %<>%
dplyr::filter(Analyst == "ARD")
Here, we perform a series of tests to address the research questions above. First, we prepare the data by removing multivariate outliers and confirming multivariate normality. Then, we run pairwise Hotelling’s \(T^2\) tests for all of the groups and analyze the results. Hotelling’s \(T^2\) is a multivariate version of Welch’s two sample t-test.
The statistical analyses below primarily consist of multivariate Hotelling’s \(T^2\) tests which assume multivariate normality within groups and are very sensitive to outliers. Here, we first remove multivariate outliers from each group and then report multivariate skewness and kurtosis for each group as a test of normality.
We determine outliers using the ordered squared robust Mahalanobis distances of the observations against the empirical distribution function of the same distances. The distance calculations are based on the Minimum Covariance Determinant (MCD) estimator as calculated using the covMcd function in the robustbase package. Observations are considered outliers if they exceed the 97.5% quantile of the chi-squared distribution of squared Mahalanobis distances. The graphs below show outlier observations as open dots; these were removed in all further analyses.
# Remove outliers from samples
am_data %<>%
split(., .$`Accession ID`) %>%
purrr::map(function(x){
covr <- robustbase::covMcd(x %>%
dplyr::select(`Diameter (mm)`,
`Thickness (mm)`) %>%
as.matrix())
x %>%
dplyr::mutate(outlier = mahalanobis(cbind(`Diameter (mm)`,
`Thickness (mm)`),
center = covr$center,
cov = covr$cov) > qchisq(0.975, df = 2))
}) %>%
bind_rows()
am_data %>%
dplyr::mutate(`Accession Label` = stringr::str_c(`Accession ID`,": ",Accession,"—",Color)) %>%
dplyr::arrange(Accession,
Origin,
Species,
Color) %>%
dplyr::mutate(`Accession Label` = factor(`Accession Label`, levels = unique(`Accession Label`))) %>%
ggplot(mapping = aes(x = `Diameter (mm)`,
y = `Thickness (mm)`,
# color = `Accession Label`,
shape = outlier)) +
# geom_smooth(method=lm) +
geom_point() +
scale_shape_manual(values=c(19,1)) +
facet_wrap(~`Accession Label`,
ncol = 2) +
theme(legend.position="none")
am_data %<>%
dplyr::filter(!outlier)
We tested for multivariate normality of each accession using Mardia’s Multivariate Normality Test as available in the mardiaTest function in the MVN package in R. The mardiaTest function calculates Mardia’s multivariate skewness and kurtosis coefficients as well as their corresponding statistical significance, and makes a determination of multivariate normality based on whether there exists significant skewness or kurtosis (or both). We report the results of those tests here for all datasets. This analysis shows that all groups are multivariate normal once outliers have been removed. We also report revised sample means and standard deviations along both dimensions.
am_data %>%
dplyr::group_by(`Accession ID`,
Accession,
Origin,
Species,
Color) %>%
dplyr::summarise(mardia = cbind(`Diameter (mm)`,
`Thickness (mm)`) %>%
as.data.frame() %>%
MVN::mardiaTest() %>%
list(),
Count = n(),
`Average Diameter (mm)` = mean(`Diameter (mm)`) %>% round(digits = 3),
`SD Diameter (mm)` = sd(`Diameter (mm)`) %>% round(digits = 3),
`Average Thickness (mm)` = mean(`Thickness (mm)`) %>% round(digits = 3),
`SD Thickness (mm)` = sd(`Thickness (mm)`) %>% round(digits = 3)
) %>%
dplyr::arrange(Accession,
Origin,
Species,
Color) %>%
dplyr::mutate(Skewness = mardia[[1]]@g1p %>% round(digits = 3),
`Skewness p` = mardia[[1]]@p.value.skew %>% round(digits = 3),
Kurtosis = mardia[[1]]@g2p %>% round(digits = 3),
`Kurtosis p` = mardia[[1]]@p.value.kurt %>% round(digits = 3),
`Multivariate Normal?` = (`Skewness p` > 0.05) & (`Kurtosis p` > 0.05)) %>%
dplyr::ungroup() %>%
dplyr::select(-Origin:-mardia) %>%
dplyr::select(`Accession ID`,
Accession,
Skewness:`Multivariate Normal?`,
`Average Diameter (mm)`:`SD Thickness (mm)`) %>%
knitr::kable(format = "html",
col.names = c("ID",
"Accession",
"Statistic",
"p",
"Statistic",
"p",
"Normal?",
"Mean",
"SD",
"Mean",
"SD")) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F) %>%
kableExtra::add_header_above(c(" " = 2,
"Skewness" = 2,
"Kurtosis" = 2,
" " = 1,
"Diameter (mm)" = 2,
"Thickness (mm)" = 2))
| ID | Accession | Statistic | p | Statistic | p | Normal? | Mean | SD | Mean | SD |
|---|---|---|---|---|---|---|---|---|---|---|
| F | C01-005 New Mexico (C006) | 0.325 | 0.833 | 6.690 | 0.395 | TRUE | 1.304 | 0.058 | 0.831 | 0.044 |
| K | C01-005 New Mexico (C006) | 0.394 | 0.790 | 6.777 | 0.436 | TRUE | 1.378 | 0.051 | 0.841 | 0.034 |
| O | C01-006 Rio San Lorenzo (C007) | 0.618 | 0.560 | 6.570 | 0.336 | TRUE | 1.127 | 0.067 | 0.627 | 0.078 |
| A | C02-007 Hopi Red Dye (C002) | 0.954 | 0.311 | 6.768 | 0.399 | TRUE | 1.253 | 0.066 | 0.723 | 0.054 |
| M | C02-009 Mountain Pima Greens (C004) | 0.125 | 0.969 | 6.412 | 0.312 | TRUE | 1.279 | 0.082 | 0.731 | 0.047 |
| E | C02-013 Alegria (C008) | 1.116 | 0.285 | 8.219 | 0.887 | TRUE | 1.377 | 0.058 | 0.837 | 0.063 |
| J | C02-013 Alegria (C008) | 0.046 | 0.995 | 7.496 | 0.743 | TRUE | 1.281 | 0.088 | 0.726 | 0.051 |
| Q | C02-014 Marbled (C016) | 0.992 | 0.433 | 5.763 | 0.180 | TRUE | 1.345 | 0.091 | 0.819 | 0.056 |
| U | C02-014 Marbled (C016) | 0.224 | 0.891 | 6.672 | 0.363 | TRUE | 1.357 | 0.082 | 0.771 | 0.045 |
| G | C02-019 Paiute (C009) | 0.993 | 0.346 | 7.834 | 0.914 | TRUE | 1.292 | 0.044 | 0.753 | 0.022 |
| Dyck | Dyck amaranth seeds, uncharred | 0.663 | 0.598 | 6.680 | 0.410 | TRUE | 1.339 | 0.098 | 0.743 | 0.076 |
The plots below each visualize the prepared data, with colors indicating either accession, species, color, or origin, as well as linear trends per accession. Hover your pointer over the data points for more information about them. Click legend elements once to turn them on or off; double-click to isolate an element; double-click again to reveal hidden elements.
(
am_data %>%
dplyr::mutate(`Accession` = stringr::str_c(`Accession ID`,": ",Accession),
`Accession` = factor(`Accession`, levels = unique(`Accession`))) %>%
ggplot(mapping = aes(x = `Diameter (mm)`,
y = `Thickness (mm)`,
group = `Accession`,
color = `Accession`,
text = stringr::str_c("Accession: ",`Accession`,"\n",
"Origin: ",`Origin`,"\n",
"Species: ",`Species`,"\n",
"Color: ",`Color`,"\n",
"Diameter: ",`Diameter (mm)`," mm \n",
"Thickness: ",`Thickness (mm)`," mm"))) +
geom_smooth(method=lm) +
geom_point()
) %>%
plotly::ggplotly(tooltip = "text")
(
am_data %>%
ggplot(mapping = aes(x = `Diameter (mm)`,
y = `Thickness (mm)`,
group = `Accession ID`,
color = `Species`,
text = stringr::str_c("Accession: ",`Accession`,"\n",
"Origin: ",`Origin`,"\n",
"Species: ",`Species`,"\n",
"Color: ",`Color`,"\n",
"Diameter: ",`Diameter (mm)`," mm \n",
"Thickness: ",`Thickness (mm)`," mm"))) +
geom_smooth(method=lm) +
geom_point()
) %>%
plotly::ggplotly(tooltip = "text")
(
am_data %>%
ggplot(mapping = aes(x = `Diameter (mm)`,
y = `Thickness (mm)`,
group = `Accession ID`,
color = `Color`,
text = stringr::str_c("Accession: ",`Accession`,"\n",
"Origin: ",`Origin`,"\n",
"Species: ",`Species`,"\n",
"Color: ",`Color`,"\n",
"Diameter: ",`Diameter (mm)`," mm \n",
"Thickness: ",`Thickness (mm)`," mm"))) +
geom_smooth(method=lm) +
geom_point()
) %>%
plotly::ggplotly(tooltip = "text")
(
am_data %>%
ggplot(mapping = aes(x = `Diameter (mm)`,
y = `Thickness (mm)`,
group = `Accession ID`,
color = `Origin`,
text = stringr::str_c("Accession: ",`Accession`,"\n",
"Origin: ",`Origin`,"\n",
"Species: ",`Species`,"\n",
"Color: ",`Color`,"\n",
"Diameter: ",`Diameter (mm)`," mm \n",
"Thickness: ",`Thickness (mm)`," mm"))) +
geom_smooth(method=lm) +
geom_point()
) %>%
plotly::ggplotly(tooltip = "text")
Here, we perform three analyses. First, we calculate pairwise Hotelling’s \(T^2\) statistics to differentiate the groups. The results of the Hotelling’s \(T^2\) analysis are then reinforced by use of a nonparametric two-sample kernel density based test. Finally, we test whether the groups may be separated by the strength of the interaction between diameter and thickness—i.e., the correlation or slope of linear regression line within each group.
combinations <- expand.grid(am_data$`Accession ID` %>%
unique(),
am_data$`Accession ID` %>%
unique()) %>%
as_tibble()
combinations_char <- combinations %>%
dplyr::mutate_all(.funs = funs(as.character))
pairwise_hotelling <- combinations_char %>%
purrrlyr::by_row(..f = function(x){
test <- Hotelling::hotelling.test(`Diameter (mm)` + `Thickness (mm)`~`Accession ID`,
data = am_data,
pair = c(x$Var1,x$Var2))
return(list(`$T^2$` = test$stats$statistic,
m = test$stats$m,
# df = test$stats$df,
# nx = test$stats$nx,
# nx = test$stats$ny,
# p = test$stats$p,
p = test$pval))
},
.collate = "list") %$%
.out %>%
purrr::transpose() %>%
as.tibble() %>%
tidyr::unnest() %>%
cbind(combinations, .)
pairwise_hotelling.char <- pairwise_hotelling %>%
dplyr::mutate_at(.vars = vars(p), .funs = ~formatC(.,format = "e", digits = 2)) %>%
dplyr::mutate_at(.vars = vars(`$T^2$`,`m`), .funs = ~sprintf("%7.3f", .))
These data were not collected using a balanced sampling strategy, as would be the case in, say, formal crop trials. Instead, data are esoteric—the contemporary data are drawn from a diverse set of geographic locations under unknown cultivation methods. Furthermore, the contemporary populations analyzed are the result of an unknown number of grow-outs for seed preservation by Native Seeds/SEARCH, which may have affected population structure since accessioning. The ancient seeds, of course, have unknown provenience (in the sense that they could have been transported to the Dyck cliff dwelling from an unknown distance). Because of these restrictions, our dataset is not conducive to a multivariate factorial MANOVA design. The most robust analysis we can hope for is to assess pairwise two-sample tests between the accessions; here, we do so not only between the ancient and modern accessions, but among the modern accessions as well. We use the pairwise significance to assess whether the ancient seeds can be distinguished from the modern accessions, and the test statistic as a distance measure between accessions in a cluster analysis among the accessions.
In the pairwise Hotelling’s \(T^2\) test result tables below, the significant results (\(p < 0.05\)) are denoted in red. Pairs that are displayed in black are not significantly different from one another. Most accessions are statistically distinguishable from one another. The Dyck cliff dwelling cannot be statistically distinguished from accession U (C02-014 Marbled (C016); Morelos, Mexico; A. cruentus; black) along only the diameter and thickness dimensions.
pairwise_hotelling.char %>%
dplyr::mutate_at(.vars = vars(`$T^2$`:`p`), .funs = ~cell_spec(.,
"html",
color = ifelse(as.numeric(pairwise_hotelling$p) < 0.05,
"red",
"black"))) %>%
dplyr::select(Var1,
Var2,
`$T^2$`) %>%
tidyr::spread(Var1,`$T^2$`) %>%
magrittr::set_rownames(.,.$Var2) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
knitr::kable(escape = FALSE,
row.names = TRUE,
align = "r")
| F | K | O | A | M | J | E | U | Q | G | Dyck | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 0.000 | 43.054 | 146.412 | 112.059 | 174.021 | 139.317 | 33.581 | 121.447 | 27.065 | 109.724 | 166.913 |
| K | 43.054 | 0.000 | 243.019 | 91.152 | 149.284 | 118.279 | 0.070 | 76.483 | 2.713 | 124.656 | 90.998 |
| O | 146.412 | 243.019 | 0.000 | 56.059 | 58.605 | 55.570 | 222.359 | 139.768 | 107.485 | 117.598 | 119.270 |
| A | 112.059 | 91.152 | 56.059 | 0.000 | 3.991 | 5.347 | 62.294 | 30.479 | 53.845 | 7.517 | 46.839 |
| M | 174.021 | 149.284 | 58.605 | 3.991 | 0.000 | 0.991 | 48.364 | 12.629 | 123.985 | 11.983 | 19.422 |
| J | 139.317 | 118.279 | 55.570 | 5.347 | 0.991 | 0.000 | 51.623 | 13.290 | 80.287 | 12.655 | 9.161 |
| E | 33.581 | 0.070 | 222.359 | 62.294 | 48.364 | 51.623 | 0.000 | 28.698 | 2.227 | 50.064 | 38.162 |
| U | 121.447 | 76.483 | 139.768 | 30.479 | 12.629 | 13.290 | 28.698 | 0.000 | 58.074 | 15.372 | 4.558 |
| Q | 27.065 | 2.713 | 107.485 | 53.845 | 123.985 | 80.287 | 2.227 | 58.074 | 0.000 | 55.703 | 89.158 |
| G | 109.724 | 124.656 | 117.598 | 7.517 | 11.983 | 12.655 | 50.064 | 15.372 | 55.703 | 0.000 | 33.876 |
| Dyck | 166.913 | 90.998 | 119.270 | 46.839 | 19.422 | 9.161 | 38.162 | 4.558 | 89.158 | 33.876 | 0.000 |
pairwise_hotelling.char %>%
dplyr::mutate_at(.vars = vars(`$T^2$`:`p`), .funs = ~cell_spec(.,
"html",
color = ifelse(as.numeric(pairwise_hotelling$p) < 0.05,
"red",
"black"))) %>%
dplyr::select(Var1,
Var2,
`m`) %>%
tidyr::spread(Var1,`m`) %>%
magrittr::set_rownames(.,.$Var2) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
knitr::kable(escape = FALSE,
row.names = TRUE,
align = "r")
| F | K | O | A | M | J | E | U | Q | G | Dyck | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 0.490 | 0.490 | 0.491 | 0.491 | 0.490 | 0.490 | 0.490 | 0.491 | 0.490 | 0.490 | 0.490 |
| K | 0.490 | 0.490 | 0.491 | 0.491 | 0.490 | 0.490 | 0.490 | 0.491 | 0.489 | 0.490 | 0.490 |
| O | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.490 | 0.491 | 0.490 |
| A | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.490 | 0.491 | 0.491 |
| M | 0.490 | 0.490 | 0.491 | 0.491 | 0.490 | 0.490 | 0.490 | 0.491 | 0.489 | 0.490 | 0.490 |
| J | 0.490 | 0.490 | 0.491 | 0.491 | 0.490 | 0.490 | 0.490 | 0.491 | 0.490 | 0.490 | 0.490 |
| E | 0.490 | 0.490 | 0.491 | 0.491 | 0.490 | 0.490 | 0.490 | 0.491 | 0.490 | 0.490 | 0.490 |
| U | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.491 | 0.490 | 0.491 | 0.491 |
| Q | 0.490 | 0.489 | 0.490 | 0.490 | 0.489 | 0.490 | 0.490 | 0.490 | 0.489 | 0.490 | 0.489 |
| G | 0.490 | 0.490 | 0.491 | 0.491 | 0.490 | 0.490 | 0.490 | 0.491 | 0.490 | 0.490 | 0.490 |
| Dyck | 0.490 | 0.490 | 0.490 | 0.491 | 0.490 | 0.490 | 0.490 | 0.491 | 0.489 | 0.490 | 0.490 |
pairwise_hotelling.char %>%
dplyr::mutate_at(.vars = vars(`$T^2$`:`p`), .funs = ~cell_spec(.,
"html",
color = ifelse(as.numeric(pairwise_hotelling$p) < 0.05,
"red",
"black"))) %>%
dplyr::select(Var1,
Var2,
`p`) %>%
tidyr::spread(Var1,`p`) %>%
magrittr::set_rownames(.,.$Var2) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
knitr::kable(escape = FALSE,
row.names = TRUE,
align = "r")
| F | K | O | A | M | J | E | U | Q | G | Dyck | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 1.00e+00 | 2.26e-07 | 8.88e-16 | 9.38e-14 | 0.00e+00 | 3.77e-15 | 3.04e-06 | 2.14e-14 | 2.73e-05 | 2.72e-13 | 2.22e-16 |
| K | 2.26e-07 | 1.00e+00 | 0.00e+00 | 4.17e-12 | 2.00e-15 | 9.41e-14 | 9.66e-01 | 7.02e-11 | 2.75e-01 | 3.73e-14 | 1.14e-11 |
| O | 8.88e-16 | 0.00e+00 | 1.00e+00 | 4.70e-09 | 3.90e-09 | 7.19e-09 | 0.00e+00 | 8.88e-16 | 6.20e-13 | 4.94e-14 | 6.31e-14 |
| A | 9.38e-14 | 4.17e-12 | 4.70e-09 | 1.00e+00 | 1.51e-01 | 8.17e-02 | 1.32e-09 | 5.93e-06 | 1.50e-08 | 3.15e-02 | 7.07e-08 |
| M | 0.00e+00 | 2.00e-15 | 3.90e-09 | 1.51e-01 | 1.00e+00 | 6.18e-01 | 5.73e-08 | 3.81e-03 | 1.26e-13 | 5.11e-03 | 3.31e-04 |
| J | 3.77e-15 | 9.41e-14 | 7.19e-09 | 8.17e-02 | 6.18e-01 | 1.00e+00 | 2.31e-08 | 2.90e-03 | 9.27e-11 | 3.87e-03 | 1.62e-02 |
| E | 3.04e-06 | 9.66e-01 | 0.00e+00 | 1.32e-09 | 5.73e-08 | 2.31e-08 | 1.00e+00 | 1.19e-05 | 3.44e-01 | 3.40e-08 | 9.23e-07 |
| U | 2.14e-14 | 7.02e-11 | 8.88e-16 | 5.93e-06 | 3.81e-03 | 2.90e-03 | 1.19e-05 | 1.00e+00 | 5.58e-09 | 1.29e-03 | 1.17e-01 |
| Q | 2.73e-05 | 2.75e-01 | 6.20e-13 | 1.50e-08 | 1.26e-13 | 9.27e-11 | 3.44e-01 | 5.58e-09 | 1.00e+00 | 1.37e-08 | 2.94e-11 |
| G | 2.72e-13 | 3.73e-14 | 4.94e-14 | 3.15e-02 | 5.11e-03 | 3.87e-03 | 3.40e-08 | 1.29e-03 | 1.37e-08 | 1.00e+00 | 3.13e-06 |
| Dyck | 2.22e-16 | 1.14e-11 | 6.31e-14 | 7.07e-08 | 3.31e-04 | 1.62e-02 | 9.23e-07 | 1.17e-01 | 2.94e-11 | 3.13e-06 | 1.00e+00 |
pairwise_kde <- combinations_char %>%
purrrlyr::by_row(..f = function(x){
d1 <- am_data %>%
dplyr::filter(`Accession ID` == x$Var1) %>%
dplyr::select(`Diameter (mm)`,`Thickness (mm)`) %>%
as.matrix()
d2 <- am_data %>%
dplyr::filter(`Accession ID` == x$Var2) %>%
dplyr::select(`Diameter (mm)`,`Thickness (mm)`) %>%
as.matrix()
test <- ks::kde.test(x1 = d1,
x2 = d2)
return(list(T = test$Tstat,
z = test$zstat,
p = test$pvalue))
},
.collate = "list") %$%
.out %>%
purrr::transpose() %>%
as.tibble() %>%
tidyr::unnest() %>%
cbind(combinations, .)
pairwise_kde.char <- pairwise_kde %>%
dplyr::mutate_at(.vars = vars(p), .funs = ~formatC(.,format = "e", digits = 2)) %>%
dplyr::mutate_at(.vars = vars(`T`,`z`), .funs = ~sprintf("%7.3f", .))
Kernel density based two-sample tests can provide a more robust comparison between multivariate datasets, as they are nonparametric and asymptotically normal—they compare empirical distributions, as opposed to merely comparing sample means and standard deviations as in parametric tests like Hotelling’s \(T^2\). Here, we use the method developed by Duong, Goud, and Schauer (2012), as provided by the kde.test function in the ks package in R.
The results of the pairwise two-sample kernel density based test are very similar to Hotelling’s \(T^2\); the Dyck samples are distinct from all modern accessions except accession U.
pairwise_kde.char %>%
dplyr::mutate_at(.vars = vars(`T`:`p`), .funs = ~cell_spec(.,
"html",
color = ifelse(as.numeric(pairwise_kde$p) < 0.05,
"red",
"black"))) %>%
dplyr::select(Var1,
Var2,
`T`) %>%
tidyr::spread(Var1,`T`) %>%
magrittr::set_rownames(.,.$Var2) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
knitr::kable(escape = FALSE,
row.names = TRUE,
align = "r")
| F | K | O | A | M | J | E | U | Q | G | Dyck | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 0.000 | 70.522 | 64.800 | 81.770 | 104.183 | 68.240 | 32.801 | 61.035 | 58.699 | 159.340 | 73.849 |
| K | 70.522 | 0.000 | 108.832 | 111.185 | 133.074 | 98.802 | 29.517 | 82.265 | 48.864 | 200.358 | 92.750 |
| O | 64.800 | 108.832 | 0.000 | 55.438 | 70.084 | 43.250 | 63.073 | 53.636 | 84.516 | 158.907 | 55.417 |
| A | 81.770 | 111.185 | 55.438 | 0.000 | 41.675 | 28.979 | 71.822 | 35.748 | 89.746 | 90.433 | 66.295 |
| M | 104.183 | 133.074 | 70.084 | 41.675 | 0.000 | 32.233 | 78.266 | 33.473 | 115.050 | 104.105 | 44.517 |
| J | 68.240 | 98.802 | 43.250 | 28.979 | 32.233 | 0.000 | 51.328 | 16.324 | 78.479 | 90.274 | 34.086 |
| E | 32.801 | 29.517 | 63.073 | 71.822 | 78.266 | 51.328 | 0.000 | 38.618 | 44.302 | 151.474 | 47.845 |
| U | 61.035 | 82.265 | 53.636 | 35.748 | 33.473 | 16.324 | 38.618 | 0.000 | 68.763 | 89.607 | 18.116 |
| Q | 58.699 | 48.864 | 84.516 | 89.746 | 115.050 | 78.479 | 44.302 | 68.763 | 0.000 | 159.857 | 83.086 |
| G | 159.340 | 200.358 | 158.907 | 90.433 | 104.105 | 90.274 | 151.474 | 89.607 | 159.857 | 0.000 | 137.042 |
| Dyck | 73.849 | 92.750 | 55.417 | 66.295 | 44.517 | 34.086 | 47.845 | 18.116 | 83.086 | 137.042 | 0.000 |
pairwise_kde.char %>%
dplyr::mutate_at(.vars = vars(`T`:`p`), .funs = ~cell_spec(.,
"html",
color = ifelse(as.numeric(pairwise_kde$p) < 0.05,
"red",
"black"))) %>%
dplyr::select(Var1,
Var2,
`p`) %>%
tidyr::spread(Var1,p) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
knitr::kable(escape = FALSE,
row.names = TRUE,
align = "r")
| F | K | O | A | M | J | E | U | Q | G | Dyck | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 9.99e-01 | 1.96e-03 | 2.75e-22 | 2.33e-32 | 2.94e-53 | 6.66e-16 | 2.52e-02 | 7.08e-20 | 1.12e-03 | 1.36e-20 | 2.34e-10 |
| K | 1.96e-03 | 9.88e-01 | 5.65e-11 | 2.35e-10 | 2.85e-13 | 1.20e-07 | 4.86e-01 | 4.97e-06 | 2.47e-01 | 2.45e-17 | 1.84e-05 |
| O | 2.75e-22 | 5.65e-11 | 1.00e+00 | 3.36e-35 | 1.40e-73 | 2.39e-10 | 1.11e-16 | 5.37e-50 | 8.76e-12 | 1.81e-25 | 4.81e-08 |
| A | 2.33e-32 | 2.35e-10 | 3.36e-35 | 1.00e+00 | 3.15e-05 | 5.55e-02 | 1.29e-18 | 2.21e-08 | 1.16e-11 | 4.59e-05 | 8.90e-10 |
| M | 2.94e-53 | 2.85e-13 | 1.40e-73 | 3.15e-05 | 1.00e+00 | 7.92e-02 | 3.36e-19 | 1.70e-04 | 3.65e-18 | 3.79e-06 | 9.16e-03 |
| J | 6.66e-16 | 1.20e-07 | 2.39e-10 | 5.55e-02 | 7.92e-02 | 9.98e-01 | 2.93e-07 | 4.23e-01 | 6.17e-08 | 3.72e-05 | 2.46e-02 |
| E | 2.52e-02 | 4.86e-01 | 1.11e-16 | 1.29e-18 | 3.36e-19 | 2.93e-07 | 9.81e-01 | 1.88e-05 | 3.75e-02 | 8.35e-18 | 5.26e-04 |
| U | 7.08e-20 | 4.97e-06 | 5.37e-50 | 2.21e-08 | 1.70e-04 | 4.23e-01 | 1.88e-05 | 1.00e+00 | 1.84e-07 | 4.96e-06 | 3.91e-01 |
| Q | 1.12e-03 | 2.47e-01 | 8.76e-12 | 1.16e-11 | 3.65e-18 | 6.17e-08 | 3.75e-02 | 1.84e-07 | 9.92e-01 | 6.95e-13 | 1.53e-06 |
| G | 1.36e-20 | 2.45e-17 | 1.81e-25 | 4.59e-05 | 3.79e-06 | 3.72e-05 | 8.35e-18 | 4.96e-06 | 6.95e-13 | 1.00e+00 | 9.32e-12 |
| Dyck | 2.34e-10 | 1.84e-05 | 4.81e-08 | 8.90e-10 | 9.16e-03 | 2.46e-02 | 5.26e-04 | 3.91e-01 | 1.53e-06 | 9.32e-12 | 9.55e-01 |
pairwise_slope <- combinations_char %>%
purrrlyr::by_row(..f = function(x){
d1 <- am_data %>%
dplyr::filter(`Accession ID` == x$Var1) %>%
dplyr::mutate(`Accession ID` = "x")
d2 <- am_data %>%
dplyr::filter(`Accession ID` == x$Var2) %>%
dplyr::mutate(`Accession ID` = "y")
m1 <- lm(`Thickness (mm)` ~ `Diameter (mm)` * `Accession ID`,
data = bind_rows(d1,d2))
m2 <- lm(`Thickness (mm)` ~ `Diameter (mm)` + `Accession ID`,
data = bind_rows(d1,d2))
test <- anova(m1,m2)
return(list(`Δm` = (lm(formula = `Thickness (mm)` ~ `Diameter (mm)`,
data = d1)$coefficients[[2]]) -
(lm(formula = `Thickness (mm)` ~ `Diameter (mm)`,
data = d2)$coefficients[[2]]),
`F` = test$F[[2]],
p = test$`Pr(>F)`[[2]]))
},
.collate = "list") %$%
.out %>%
purrr::transpose() %>%
as.tibble() %>%
tidyr::unnest() %>%
cbind(combinations, .)
pairwise_slope.char <- pairwise_slope %>%
dplyr::mutate_at(.vars = vars(p), .funs = ~formatC(.,format = "e", digits = 2)) %>%
dplyr::mutate_at(.vars = vars(`Δm`,`F`), .funs = ~sprintf("%7.3f", .))
Our final analysis compares the slope of the linear relationship of thickness and diameter for each accession. We ask the question of whether this relationship may be combined with the raw measurements to further define consistent groups. Here, we do so by testing for the interaction between the one of the dependent variables and the factor (accession) using an analysis of covariance (ANCOVA)—we model seed thickness as a response to seed diameter, with and without interaction with accession. If the model with interaction provides a significantly better fit to the data, the slopes within the groups are significantly different.
As above, significant slope differences (\(Δm\)) are shown in red. The results contrast with the analyses above. The relationship of diameter to thickness in Dyck seeds are most similar to those of accessions A (C02-007 Hopi Red Dye (C002); Hopi, Arizona; A. cruentus; black) and E (C02-013 Alegria (C008); Morelos, Mexico; A. cruentus; white), with larger but still statistically insignificant differences with accessions F, K, and Q. In general, accessions from simliar origins have similar slopes—for example, the slopes all the accessions from Morelos, Mexico are mutually indistinguishable. This suggests a relationship between local environment and seed morphology. Future research to assess the consistency of this pattern should be undertaken.
pairwise_slope.char %>%
dplyr::mutate_at(.vars = vars(`Δm`:`p`), .funs = ~cell_spec(.,
"html",
color = ifelse(as.numeric(pairwise_slope$p) < 0.05,
"red",
"black"))) %>%
dplyr::select(Var1,
Var2,
`Δm`) %>%
tidyr::spread(Var1,`Δm`) %>%
magrittr::set_rownames(.,.$Var2) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
knitr::kable(escape = FALSE,
row.names = TRUE,
align = "r")
| F | K | O | A | M | J | E | U | Q | G | Dyck | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 0.000 | -0.014 | 0.443 | 0.186 | -0.002 | -0.067 | 0.135 | -0.127 | 0.032 | -0.229 | 0.161 |
| K | 0.014 | 0.000 | 0.457 | 0.200 | 0.013 | -0.053 | 0.150 | -0.112 | 0.046 | -0.215 | 0.176 |
| O | -0.443 | -0.457 | 0.000 | -0.257 | -0.445 | -0.510 | -0.308 | -0.570 | -0.412 | -0.672 | -0.282 |
| A | -0.186 | -0.200 | 0.257 | 0.000 | -0.188 | -0.253 | -0.050 | -0.312 | -0.154 | -0.415 | -0.024 |
| M | 0.002 | -0.013 | 0.445 | 0.188 | 0.000 | -0.065 | 0.137 | -0.125 | 0.033 | -0.227 | 0.163 |
| J | 0.067 | 0.053 | 0.510 | 0.253 | 0.065 | 0.000 | 0.202 | -0.060 | 0.098 | -0.162 | 0.228 |
| E | -0.135 | -0.150 | 0.308 | 0.050 | -0.137 | -0.202 | 0.000 | -0.262 | -0.104 | -0.364 | 0.026 |
| U | 0.127 | 0.112 | 0.570 | 0.312 | 0.125 | 0.060 | 0.262 | 0.000 | 0.158 | -0.102 | 0.288 |
| Q | -0.032 | -0.046 | 0.412 | 0.154 | -0.033 | -0.098 | 0.104 | -0.158 | 0.000 | -0.260 | 0.130 |
| G | 0.229 | 0.215 | 0.672 | 0.415 | 0.227 | 0.162 | 0.364 | 0.102 | 0.260 | 0.000 | 0.390 |
| Dyck | -0.161 | -0.176 | 0.282 | 0.024 | -0.163 | -0.228 | -0.026 | -0.288 | -0.130 | -0.390 | 0.000 |
pairwise_slope.char %>%
dplyr::mutate_at(.vars = vars(`Δm`:`p`), .funs = ~cell_spec(.,
"html",
color = ifelse(as.numeric(pairwise_slope$p) < 0.05,
"red",
"black"))) %>%
dplyr::select(Var1,
Var2,
`F`) %>%
tidyr::spread(Var1,`F`) %>%
magrittr::set_rownames(.,.$Var2) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
knitr::kable(escape = FALSE,
row.names = TRUE,
align = "r")
| F | K | O | A | M | J | E | U | Q | G | Dyck | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 0.000 | 0.011 | 7.587 | 2.316 | 0.000 | 0.303 | 0.468 | 1.046 | 0.085 | 2.598 | 1.645 |
| K | 0.011 | 0.000 | 8.070 | 3.193 | 0.021 | 0.206 | 0.561 | 0.884 | 0.242 | 3.317 | 2.082 |
| O | 7.587 | 8.070 | 0.000 | 3.731 | 14.407 | 15.945 | 2.387 | 19.604 | 11.560 | 15.890 | 4.687 |
| A | 2.316 | 3.193 | 3.731 | 0.000 | 5.985 | 7.044 | 0.089 | 10.183 | 3.632 | 13.313 | 0.061 |
| M | 0.000 | 0.021 | 14.407 | 5.985 | 0.000 | 0.726 | 0.795 | 2.432 | 0.339 | 7.232 | 4.157 |
| J | 0.303 | 0.206 | 15.945 | 7.044 | 0.726 | 0.000 | 1.522 | 0.411 | 1.515 | 1.829 | 6.034 |
| E | 0.468 | 0.561 | 2.387 | 0.089 | 0.795 | 1.522 | 0.000 | 2.554 | 0.427 | 3.054 | 0.024 |
| U | 1.046 | 0.884 | 19.604 | 10.183 | 2.432 | 0.411 | 2.554 | 0.000 | 3.608 | 0.682 | 9.209 |
| Q | 0.085 | 0.242 | 11.560 | 3.632 | 0.339 | 1.515 | 0.427 | 3.608 | 0.000 | 7.970 | 2.421 |
| G | 2.598 | 3.317 | 15.890 | 13.313 | 7.232 | 1.829 | 3.054 | 0.682 | 7.970 | 0.000 | 9.473 |
| Dyck | 1.645 | 2.082 | 4.687 | 0.061 | 4.157 | 6.034 | 0.024 | 9.209 | 2.421 | 9.473 | 0.000 |
pairwise_slope.char %>%
dplyr::mutate_at(.vars = vars(`Δm`:`p`), .funs = ~cell_spec(.,
"html",
color = ifelse(as.numeric(pairwise_slope$p) < 0.05,
"red",
"black"))) %>%
dplyr::select(Var1,
Var2,
`p`) %>%
tidyr::spread(Var1,p) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
knitr::kable(escape = FALSE,
row.names = TRUE,
align = "r")
| F | K | O | A | M | J | E | U | Q | G | Dyck | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 1.00e+00 | 9.19e-01 | 8.08e-03 | 1.34e-01 | 9.86e-01 | 5.85e-01 | 4.97e-01 | 3.11e-01 | 7.71e-01 | 1.13e-01 | 2.06e-01 |
| K | 9.19e-01 | 1.00e+00 | 6.45e-03 | 7.98e-02 | 8.85e-01 | 6.52e-01 | 4.57e-01 | 3.51e-01 | 6.25e-01 | 7.47e-02 | 1.56e-01 |
| O | 8.08e-03 | 6.45e-03 | 1.00e+00 | 5.86e-02 | 3.92e-04 | 2.06e-04 | 1.28e-01 | 4.57e-05 | 1.37e-03 | 2.10e-04 | 3.52e-02 |
| A | 1.34e-01 | 7.98e-02 | 5.86e-02 | 1.00e+00 | 1.78e-02 | 1.05e-02 | 7.66e-01 | 2.32e-03 | 6.26e-02 | 6.03e-04 | 8.06e-01 |
| M | 9.86e-01 | 8.85e-01 | 3.92e-04 | 1.78e-02 | 1.00e+00 | 3.98e-01 | 3.77e-01 | 1.25e-01 | 5.63e-01 | 9.76e-03 | 4.71e-02 |
| J | 5.85e-01 | 6.52e-01 | 2.06e-04 | 1.05e-02 | 3.98e-01 | 1.00e+00 | 2.23e-01 | 5.24e-01 | 2.25e-01 | 1.82e-01 | 1.77e-02 |
| E | 4.97e-01 | 4.57e-01 | 1.28e-01 | 7.66e-01 | 3.77e-01 | 2.23e-01 | 1.00e+00 | 1.16e-01 | 5.17e-01 | 8.67e-02 | 8.77e-01 |
| U | 3.11e-01 | 3.51e-01 | 4.57e-05 | 2.32e-03 | 1.25e-01 | 5.24e-01 | 1.16e-01 | 1.00e+00 | 6.34e-02 | 4.13e-01 | 3.78e-03 |
| Q | 7.71e-01 | 6.25e-01 | 1.37e-03 | 6.26e-02 | 5.63e-01 | 2.25e-01 | 5.17e-01 | 6.34e-02 | 1.00e+00 | 7.00e-03 | 1.27e-01 |
| G | 1.13e-01 | 7.47e-02 | 2.10e-04 | 6.03e-04 | 9.76e-03 | 1.82e-01 | 8.67e-02 | 4.13e-01 | 7.00e-03 | 1.00e+00 | 3.44e-03 |
| Dyck | 2.06e-01 | 1.56e-01 | 3.52e-02 | 8.06e-01 | 4.71e-02 | 1.77e-02 | 8.77e-01 | 3.78e-03 | 1.27e-01 | 3.44e-03 | 1.00e+00 |
t.diameter <- am_data %>%
dplyr::filter(`Accession ID` != "O") %>%
t.test(`Diameter (mm)` ~ Age,
data = .)
t.thickness <- am_data %>%
dplyr::filter(`Accession ID` != "O") %>%
t.test(`Thickness (mm)` ~ Age,
data = .)
Taking only the multivariate analyses into account, the amaranth seeds from Dyck cliff dwelling are statistically distinct from all modern accessions analyzed from Native Seeds/SEARCH, with the exception of accession U (C02-014 Marbled (C016); Morelos, Mexico; A. cruentus; black). Both the Dyck seeds and accession U seeds are black. However, when comparing the linear relationship between the diameter and thickness measurements between accessions, the Dyck samples appear closer to accessions A (C02-007 Hopi Red Dye (C002); Hopi, Arizona; A. cruentus; black) and E (C02-013 Alegria (C008); Morelos, Mexico; A. cruentus; white), and are also statistically indistinguishable from accessions F, K, and Q. Accessions F and K are in fact from the same accession of A. hypochondriacus, a white amaranth seed from Rinconada, New Mexico. Accession Q is another amaranth from Morelos, Mexico.
When all modern accessions except accession O (Rio San Lorenzo, a clear outlier) are considered, the Dyck amaranth seeds fall within the normal range of seed diameter (\(t\)=-1.06; \(df\)=27.61; \(p\)=2.97e-01), but are significantly less thick than the modern seeds (\(t\)=2.34; \(df\)=27.94; \(p\)=2.66e-02).
manova_color <- am_data %>%
dplyr::filter(`Origin` == "Morelos, Mexico") %>%
manova(cbind(`Diameter (mm)`,`Thickness (mm)`) ~ Color*Accession,
data = .) %>%
broom::tidy() %>%
magrittr::set_rownames(.,.$term)
anova_color <- am_data %>%
dplyr::filter(`Origin` == "Morelos, Mexico") %>%
manova(cbind(`Diameter (mm)`,`Thickness (mm)`) ~ Color*Accession,
data = .) %>%
aov() %>%
summary() %>%
map(broom::tidy) %>%
map(function(x){magrittr::set_rownames(x,x$term)}) %>%
magrittr::set_names(c("diameter","thickness"))
The graph illustrating colors above clearly suggests a relationship between color and seed thickness among domesticated amaranths. A more quantitative answer to this question likely requires more balanced data than are available in this study, but a portion of the dataset is amenable to exploring an answer. A balanced dataset (or design) is one where all combinations of factor variables are represented within the dataset, preferably with relatively equal sample sizes. The accessions from Native Seeds/SEARCH are from many different locations, and are not balanced. For example, the A. hypochondriacus accessions only contain white seeds (though there are black specimens of A. hypochondriacus accessioned with Native Seeds/SEARCH, but not analyzed for this study), meaning that we cannot access the relationship between color and morphometrics for that species. However, there are two examples of white and black amaranth seeds analyzed from the same accessions: E and J (C02-013 Alegria (C008); Morelos, Mexico; A. cruentus), and Q and U (C02-014 Marbled (C016); Morelos, Mexico; A. cruentus). A multivariate analysis of variance (MANOVA) of the Morelos, Mexico accessions—\(\mathrm{Diameter (mm)} + \mathrm{Thickness (mm)} \sim \mathrm{Color} \times \mathrm{Accession}\)—shows significant multivariate morphometric differences between white and black seeds (\(\Lambda_{Pillai}\)=0.45; \(F_{approx}\)=42.25; \(df\)=102; \(p\)=4.30e-14), and a weaker but still significant influence of the interaction between color and accession on multivariate morphometrics (\(\Lambda_{Pillai}\)=0.1; \(F_{approx}\)=5.96; \(df\)=102; \(p\)=3.55e-03). Univariate ANOVAs on each dimension confirm both color and the interaction between color and accession have significant effects on seed diameter (Color: \(F\)=6.98; \(p\)=9.54e-03; Color × Accession: \(F\)=11.75; \(p\)=8.76e-04) and seed thickness (Color: \(F\)=58.38; \(p\)=1.16e-11; Color × Accession: \(F\)=9; \(p\)=3.39e-03).
Further research under balanced design will illuminate whether these relationships are more general across domesticated amaranths.
Here again we have a balance problem in the data that prohibits us from making a case for discriminating between A. hypochondriacus and A. cruentus based on seed metrics alone. The accessions of A. hypochondriacus analyzed here is small and esoteric—they come from two origins (Rinconada, New Mexico and Durango, Mexico) that are not shared by any of the A. cruentus accessions, and the accession from Durango, Mexico (O: C01-006 Rio San Lorenzo (C007)) has markedly different metrics than all of the other accessions, including the Dyck accessions. There is also a suggestion that the two Rinconada, New Mexico accessions (F and K) may suffer from biased sampling; they come from the same Native Seeds/SEARCH accession (C01-005 New Mexico (C006)), and have the same color, but their means are significantly different from one another according to the multivariate Hotelling’s \(T^2\) test above. A more robust and reliable sample of A. hypochondriacus is necessary in order to accurately discriminate between A. hypochondriacus and A. cruentus in ancient samples.
We can use the statistical test matrices presented above as the basis for a hierarchical agglomerative clustering analyses akin to those performed for genetical data. Here, we standardize the scaled Hotelling’s \(T^2\) and \(\lvertΔm\rvert\) and use them as distance matrices for Ward’s hierarchical clustering method.
fit <- pairwise_hotelling %>%
dplyr::mutate(`$T^2$` = `$T^2$` * m) %>%
dplyr::select(Var1,
Var2,
`$T^2$`) %>%
tidyr::spread(Var1,`$T^2$`) %>%
magrittr::set_rownames(.,.$Var2) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
magrittr::divide_by(.,max(.)) %>%
as.dist() %>%
hclust(method="ward.D2")
fit$height %<>%
magrittr::divide_by(.,max(.))
fit %<>%
dendro_data(type="rectangle")
fit$labels %<>%
dplyr::left_join(am_data %>%
dplyr::select(`Accession ID`,
`Accession`,
Origin,
Species,
Color) %>%
dplyr::distinct(),
by = c("label" = "Accession ID")) %>%
dplyr::mutate(`Accession` = stringr::str_c(`label`,": ",Accession),
`Accession` = factor(`Accession`, levels = unique(`Accession`)))
dendro_plot <- ggplot() +
geom_segment(data = fit$segments,
aes(x = x,
y = y,
xend = xend,
yend = yend)) +
coord_flip() +
scale_y_reverse(limits = c(1.0,-0.6),
breaks = seq(0,1,0.2)) +
ylab("Scaled Distances")+
ggplot2::theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.title.x = element_text(colour = NA),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
The plots below each visualize the data clustered by Hotelling’s \(T^2\), with label colors indicating either species, color, or origin. The distance measure (Hotelling’s \(T^2\)) is standardized to be between zero and one.
Clusters are strongly oriented by color, reinforcing the relationship between color and seed thickness among domesticated amaranths as suggested above. Apparent slight clustering by species is likely due to the fact that all A. hypochondriacus accessions are white. There is no apparent relationship between place of origin and Hotelling’s \(T^2\) distances. The Dyck amaranth seeds cluster with accession U (C02-014 Marbled (C016)).
dendro_plot +
geom_text(data = fit$labels,
aes(x = x,
y = y - 0.025,
label = Accession,
hjust = "outward",
color = Species),
size = 3)
dendro_plot +
geom_text(data = fit$labels,
aes(x = x,
y = y - 0.025,
label = Accession,
hjust = "outward",
color = Color),
size = 3)
dendro_plot +
geom_text(data = fit$labels,
aes(x = x,
y = y - 0.025,
label = Accession,
hjust = "outward",
color = Origin),
size = 3)
fit <- pairwise_slope %>%
dplyr::mutate(`Δm` = abs(`Δm`)) %>%
dplyr::select(Var1,
Var2,
`Δm`) %>%
tidyr::spread(Var1,`Δm`) %>%
magrittr::set_rownames(.,.$Var2) %>%
dplyr::select(-Var2) %>%
magrittr::set_rownames(.,colnames(.)) %>%
magrittr::divide_by(.,max(.)) %>%
as.dist() %>%
hclust(method="ward.D2")
fit$height %<>%
magrittr::divide_by(.,max(.))
fit %<>%
dendro_data(type="rectangle")
fit$labels %<>%
dplyr::left_join(am_data %>%
dplyr::select(`Accession ID`,
`Accession`,
Origin,
Species,
Color) %>%
dplyr::distinct(),
by = c("label" = "Accession ID")) %>%
dplyr::mutate(`Accession` = stringr::str_c(`label`,": ",Accession),
`Accession` = factor(`Accession`, levels = unique(`Accession`)))
dendro_plot <- ggplot() +
geom_segment(data = fit$segments,
aes(x = x,
y = y,
xend = xend,
yend = yend)) +
coord_flip() +
scale_y_reverse(limits = c(1.0,-0.85),
breaks = seq(0,1,0.2)) +
ylab("Scaled Distances")+
ggplot2::theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.title.x = element_text(colour = NA),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
The plots below each visualize the data clustered by \(\lvertΔm\rvert\)—the absolute difference in slopes—with label colors indicating either species, color, or origin. The distance measure (\(\lvertΔm\rvert\)) is standardized to be between zero and one.
As stated above, clustering by slope reveals a close relationship between the Dyck amaranth seeds, accession A (C02-007 Hopi Red Dye (C002)), and accession E (C02-013 Alegria (C008)). The former is intriguing in that it is geographically proximate to the Dyck cliff dwelling; the latter is interesting in that it shares its origin (Morelos, Mexico) with accession U, which clustered with the Dyck samples in the analysis above. Otherwise, there seems to be a slight propensity for accessions to cluster with those of similar color. The relationship between place of origin and slope cluster is inconclusive without further geographic analysis.
dendro_plot +
geom_text(data = fit$labels,
aes(x = x,
y = y - 0.025,
label = Accession,
hjust = "outward",
color = Species),
size = 3)
dendro_plot +
geom_text(data = fit$labels,
aes(x = x,
y = y - 0.025,
label = Accession,
hjust = "outward",
color = Color),
size = 3)
dendro_plot +
geom_text(data = fit$labels,
aes(x = x,
y = y - 0.025,
label = Accession,
hjust = "outward",
color = Origin),
size = 3)
In this analysis, we explored morphometric measurements of several accessions of amaranth seeds—ancient and modern—from the Southwestern United States and Mexico. We calculated pairwise relationships between accessions, and ultimately used these relationships to run a cluster analysis.
We sought to answer four questions:
Are the Dyck seeds statistically distinct from the modern samples?
The Dyck seeds are statistically distinct from all modern accessions, with the exception of accession U (C02-014 Marbled (C016)). While the Dyck seed diameters fall within the normal range of modern accessions, they are significantly less thick than the modern accessions.
How does seed color (white versus black) relate to seed metrics?
Visual assessment of all accessions, and statistical assessment of those from Morelos, Mexico, strongly suggest significant multivariate morphometric differences between white and black seeds. In general, white seeds have larger thickness and diameter measurements than black seeds.
Can species be assigned using seed metrics?
The restricted sample of A. hypochondriacus precludes us from discriminating between A. hypochondriacus and A. cruentus based on seed metrics alone.
Do seed metrics form meaningful clusters of accessions by factor?
Data clustered by Hotelling’s \(T^2\) are strongly grouped by color, reinforcing the relationship between color and seed thickness among domesticated amaranths. Results from clustering by slope similarity were inconclusive. In both cluster analyses, the Dyck samples form rather discrete clusters with accession A (C02-007 Hopi Red Dye (C002)) and two accessions from Morelos, Mexico: E (C02-013 Alegria (C008)) and U (C02-014 Marbled (C016)).
Additional morphometric data documenting the diversity of domesticated amaranths will further illuminate the relationships between species, phenotype, and environment, and will better allow us to estabilish connections between contemporary amaranth populations and their ancient progenitors.